# Self-supervised pre-training
Rnaernie
RNAErnie is a model for self-supervised pre-training based on non-coding RNA sequences. It uses a multi-stage masked language modeling objective to provide powerful feature representation capabilities for RNA research.
Molecular Model
PyTorch
R
multimolecule
11.00k
1
Prophetnet Large Uncased
ProphetNet is a sequence-to-sequence pre-trained language model that employs a self-supervised objective of future n-gram prediction, capable of predicting more future tokens
Large Language Model English
P
microsoft
5,528
5
Mahadhwani Pretrained Conformer
MIT
A pre-trained Conformer encoder model based on self-supervised learning, supporting automatic speech recognition tasks for 22 scheduled Indian languages.
Speech Recognition
M
ai4bharat
349
1
Dasheng Base
Apache-2.0
Large-scale general-purpose audio encoder trained via self-supervised learning, capable of processing multi-domain audio information including speech, music, and environmental sounds
Audio Classification
Transformers

D
mispeech
273
1
Gpt2 Demo
Other
GPT-2 is a self-supervised pre-trained language model based on the Transformer architecture, which excels at text generation tasks.
Large Language Model
Transformers

G
demo-leaderboard
19.21k
1
Regnety 1280.seer
Other
RegNetY-128GF feature extraction model, pre-trained using the SEER method on 2 billion random web images with self-supervised learning
Image Classification
Transformers

R
timm
62
0
Convnextv2 Pico.fcmae
ConvNeXt-V2 self-supervised feature representation model, pre-trained using the Fully Convolutional Masked Autoencoder (FCMAE) framework, suitable for image classification and feature extraction tasks.
Image Classification
Transformers

C
timm
82
0
Convnextv2 Large.fcmae
A self-supervised feature representation model based on ConvNeXt-V2, utilizing the Fully Convolutional Masked Autoencoder (FCMAE) framework for pre-training, suitable for image classification and feature extraction tasks.
Image Classification
Transformers

C
timm
314
0
Vit Msn Large 7
Apache-2.0
This Vision Transformer model is pre-trained using the MSN method and excels in few-shot scenarios, suitable for tasks like image classification
Image Classification
Transformers

V
facebook
67
2
Swinv2 Small Patch4 Window8 256
Apache-2.0
Swin Transformer v2 is a vision Transformer model that achieves efficient image processing through hierarchical feature maps and local window self-attention mechanisms.
Image Classification
Transformers

S
microsoft
1,836
0
Viwav2vec2 Base 3k
This model is a Wav2Vec2 base model pre-trained on 3,000 hours of Vietnamese speech data, suitable for Vietnamese speech recognition tasks, and requires fine-tuning on downstream tasks for use.
Speech Recognition
Transformers Other

V
dragonSwing
41
2
Regnet Y 640 Seer In1k
Apache-2.0
RegNet model trained on imagenet-1k, pre-trained in a self-supervised manner on billions of random web images before fine-tuning
Image Classification
Transformers

R
facebook
21
0
Xlm Align Base
XLM-Align is a pre-trained cross-lingual model supporting 94 languages, improving pre-trained cross-lingual models through self-labeled word alignment.
Large Language Model
Transformers

X
microsoft
354
9
Beit Large Patch16 512
Apache-2.0
BEiT is a vision Transformer-based image classification model, pre-trained in a self-supervised manner on ImageNet-21k and fine-tuned on ImageNet-1k.
Image Classification
B
microsoft
683
11
Wav2vec2 Large 960h Lv60 Self
Apache-2.0
The Wav2Vec2 large model developed by Facebook, pre-trained and fine-tuned on 960 hours of Libri-Light and Librispeech audio data, using self-training objectives, achieving SOTA results on the LibriSpeech test set.
Speech Recognition English
W
facebook
56.00k
146
Beit Large Finetuned Ade 640 640
Apache-2.0
BEiT is an image segmentation model based on Vision Transformer architecture, achieving efficient semantic segmentation through self-supervised pre-training and fine-tuning on the ADE20k dataset.
Image Segmentation
Transformers

B
microsoft
14.97k
14
Beit Base Patch16 224
Apache-2.0
BEiT is a vision model based on image transformers, employing a BERT-like self-supervised pre-training method. It is first pre-trained and fine-tuned on ImageNet-22k, then further fine-tuned on ImageNet-1k.
Image Classification
B
nielsr
28
0
Beit Base Patch16 224
Apache-2.0
BEiT is a Vision Transformer-based model pre-trained on ImageNet-21k through self-supervised learning and fine-tuned on ImageNet-1k for image classification tasks.
Image Classification
B
microsoft
58.34k
9
Prot T5 Xl Bfd
ProtT5-XL-BFD is a self-supervised pre-trained model based on protein sequences, using the T5 architecture, trained on 2.1 billion protein sequences for protein feature extraction and downstream task fine-tuning.
Protein Model
Transformers

P
Rostlab
605
10
Prot T5 Xl Uniref50
A protein sequence pre-training model based on T5-3B architecture that captures biophysical properties through self-supervised learning
Protein Model
Transformers

P
Rostlab
78.45k
44
Featured Recommended AI Models